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Abstract 

Low-rank structure have been profoundly studied in data mining and 
machine learning. In this paper, we show a dense matrix X's low-rank ap- 
proximation can be rapidly built from its left and right random projections 
Y\ = XA\ and Y% = X 7 A 2 , or bilateral random projection (BRP). We then 
show power scheme can further improve the precision. The deterministic, 
average and deviation bounds of the proposed method and its power scheme 
modification are proved theoretically. The effectiveness and the efficiency of 
BRP based low-rank approximation is empirically verified on both artificial 
and real datasets. 



1 Introduction 

Recent researches about low-rank structure concentrate on developing fast ap- 
proximation and building meaningful decompositions. Two appealing represen- 
tatives are the randomized approximate matrix decomposition j3l and column se- 
lection [HI. The former proves that a matrix can be well approximated by its pro- 
jection to the column space of its random projections. This rank-revealing method 
provides a fast approximation of SVD/PCA. The latter proves that a column subset 
of a low-rank matrix can span its whole range. 

In this paper, we consider the problem of fast low-rank approximation. Given r 
bilateral random projections (BRP) of an m x n dense matrix X (w.l.o.g, m > n), 
i.e., Yl = XAi and Y 2 = X T A 2 , wherein A x e W lXr and A 2 e W nxr are random 
matrices, 

L = Y l (A T 2 Y i y l Y 2 T (1) 

is a fast rank-r approximation of X. The computation of L includes an inverse 
of an r x r matrix and three matrix multiplications. Thus, for a dense X, 2mnr 
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floating-point operations (flops) are required to obtain BRP, r 2 (2n+r)+mnr flops 
are required to obtain L. The computational cost is much less than SVD based 
approximation. The L in £0 has been proposed in $2% as a recovery of a rank- 
r matrix X from Yy and Y%, where Ay and A 2 are independent Gaussian/SRFT 
random matrices. However, we propose that L is a tight rank-r approximation of 
a full rank matrix X, when Ay and A 2 are correlated random matrices updated 
from Y 2 and Yy, respectively. We then apply power scheme (61 to L for improving 
the approximation precision, especially when the eigenvalues of X decay slowly. 

Theoretically, we prove the deterministic bound, average bound and deviation 
bound of the approximation error in BRP based low-rank approximation and its 
power scheme modification. The results show the error of BRP based approxima- 
tion is close to the error of SVD approximation under mild conditions. Comparing 
with randomized SVD in [3] that extracts the column space from unilateral ran- 
dom projections, the BRP based method estimates both column and row spaces 
from bilateral random projections. 

We give an empirical study of BRP on both artificial data and face image 
dataset. The results show its effectiveness and efficiency in low-rank approxima- 
tion and recovery. 

2 Bilateral random projections (BRP) based low-rank 
approximation 

We first introduce the bilateral random projections (BRP) based low-rank approx- 
imation and its power scheme modification. The approximation error bounds of 
these two methods are discussed at the end of this section. 

2.1 Low-rank approximation with closed form 

In order to improve the approximation precision of L in (Q~|) when Ay and A 2 
are standard Gaussian matrices, we use the obtained right random projection Yy 
to build a better left projection matrix A 2 , and use Y 2 to build a better A\. In 
particular, after Y\ = XAy, we update A 2 = Yy and calculate the left random 
projection Y 2 = X T A 2 , then we update Ay = Y 2 and calculate the right random 
projection Yy = XAy. A better low-rank approximation L will be obtained if the 
new Yy and Y 2 are applied to (QQ). This improvement requires additional flops of 
mnr in BRP calculation. 
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2.2 Power scheme modification 



When singular values of X decay slowly, (Q~|) may perform poorly. We design 
a modification for this situation based on the power scheme (61. In the power 
scheme modification, we instead calculate the BRP of a matrix X = (XX T ) q X, 

whose singular values decay faster than X. In particular, \i(X) = Xi(X) 2q+1 . 
Both X and X share the same singular vectors. The BRP of X is: 

Y 1 = XA 1 ,Y 2 = X T A 2 . (2) 

According to (Q~|), the BRP based r rank approximation of X is: 

L = Y 1 (A^Y 1 )' 1 Y 2 T . (3) 

In order to obtain the approximation of X with rank r, we calculate the QR de- 
composition of Yi and Y 2 , i.e., 

Y l = Q 1 R h Y 2 = Q 2 R 2 . (4) 

The low-rank approximation of X is then given by: 

L=(L)***=Q 1 [ifc (J$Yi) 1 Rl] ^ Q T 2 . (5) 

The power scheme modification © requires an inverse of an r x r matrix, an 
SVD of an r x r matrix and five matrix multiplications. Therefore, for dense X, 
2(2g + l)mnr flops are required to obtain BRP, r 2 (m + n) flops are required to 
obtain the QR decompositions, 2r 2 (n + 2r) + mnr flops are required to obtain L. 
The power scheme modification reduces the error of © by increasing q. When 
the random matrices A 1 and A 2 are built from Y 1 and Y 2 , mnr additional flops are 
required in the BRP calculation. 



3 Approximation error bounds 

We analyze the error bounds of the BRP based low-rank approximation © and its 
power scheme modification ©. 

The SVD of an m x n (w.l.o.g, m > n) matrix X takes the form: 

X = UAV T = UxAxV? + U 2 A 2 V? } (6) 
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where Ai is an r x r diagonal matrix which diagonal elements are the first largest 
r singular values, U x and V x are the corresponding singular vectors, A 2 , U 2 and V 2 
forms the rest part of SVD. Assume that r is the target rank, A x and A 2 have r + p 
columns for oversampling. We consider the spectral norm of the approximation 
error E for (OQ): 



IX -LI 



X-Y x (AlY x ) l Y, 
I-XA X (AlXA^ 1 A 



X 



(7) 



The unitary invariance of the spectral norm leads to 



\X - L| 



U 



A 



/ - XA X (A\XA X ) 1 A 
I-V T A X (AlXA^A^UK 



X 



(8) 



In low-rank approximation, the left random projection matrix A 2 is built from 
the left random projection Y x = XA X , and then the right random projection matrix 
A x is built from the left random projection Y 2 = X T A 2 . Thus A 2 = Y x = 
XA X = UAV T A X and A x = Y 2 = X T A 2 = X T XA X = VA 2 V T A X . Hence the 
approximation error given in © has the following form: 



A 



A 2 V T A X (A^VA^A,)' 1 AlVK 2 



(9) 



The following Theorem Q] gives the bound for the spectral norm of the deter- 
ministic error \\X — L\\. 

Theorem 1. (Deterministic error bound) Given an m x n{m > n) real matrix 
X with singular value decomposition X = UAV T = U X A 1 V 1 T + U 2 A 2 V 2 T , and 
chosen a target rank r < n — 1 and an n x (r + p) (p > 2) standard Gaussian 
matrix A x , the BRP based low-rank approximation (UJ) approximates X with the 
error upper bounded by 

\\X - Lf < \\A\ (V 2 T A X ) (VfA^A^W 2 + ||A 2 || 2 . 
See Section H] for the proof of Theorem [T] 

If the singular values of X decay fast, the first term in the deterministic error 
bound will be very small. The last term is the rank-r SVD approximation error. 
Therefore, the BRP based low-rank approximation (OQ) is nearly optimal. 
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Theorem 2. (Deterministic error bound, power scheme) Frame the hypotheses 
of Theorem \T\ the power scheme modification (T5]) approximates X with the error 
upper bounded by 



\X-L\\ 2 < 



t A -(29+1) 



+iiAnr) 1/(2,+1> - 

See Section |4] for the proof of Theorem [21 

If the singular values of X decay slowly, the error produced by the power 
scheme modification © is less than the BRP based low-rank approximation (Q]) 
and decreasing with the increasing of q. 

The average error bound of BRP based low-rank approximation is obtained 
by analyzing the statistical properties of the random matrices that appear in the 
deterministic error bound in Theorem [T] 

Theorem 3. (Average error bound) Frame the hypotheses of Theorem^ 



MX-LW < 



+ 1 I |A 



r+1 1 



+ 



ey/r + p 
P 



" \2 
i=r+l r 



See Section|4]for the proof of Theorem[3j 

The average error bound will approach to the SVD approximation error | A r+ i | 

if |A r +i| <C | \i:i=l,— , r \ an d |A r | ^> |Ai : i =r +i 5 ... )ri |. 

The average error bound for the power scheme modification is then obtained 
from the result of Theorem [3j 

Theorem 4. (Average error bound, power scheme) Frame the hypotheses of 
TheoremU} the power scheme modification (0) approximates X with the expected 
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error upper bounded by 



E\\X - L\\ < 




A 



2g+l 

T + l 



1/(29+1) 



1 



/ j n x 2(2g+l) 

e^/r + p \ 




P \ A 2 

1 \ i=r+l A r 



See Section |4] for the proof of Theorem HI 

Compared the average error bounds of the BRP based low-rank approximation 
with its power scheme modification, the latter produces less error than the former, 
and the error can be further decreased by increasing q. 

The deviation bound for the spectral norm of the approximation error can be 
obtained by analyzing the deviation bound of ||A| (V^Ai) (V^Ai^Af 1 || in the 
deterministic error bound and by applying the concentration inequality for Lips- 
chitz functions of a Gaussian matrix. 

Theorem 5. (Deviation bound) Frame the hypotheses ofTheoremUl Assume that 
p > 4. For all u,t > 1, it holds that 



except with probability e"" 2/2 + At~ p + 

See Section|4]for the proof of Theorem[5l 

4 Proofs of error bounds 
4.1 Proof of Theorem H 

The following lemma and propositions from [3J will be used in the proof. 
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Lemma 1. Suppose that M y 0. For every A, the matrix A 1 'MA >z 0. In 
particular, 

M A T MA ■< A T NA. (10) 

Proposition 1. Suppose range(iV) C range(M). Then, for each matrix A, it 
holds that \\V N A\\ < \\V M A\\ and that - V M )A\\ < \\{I - V N )A\\. 



Proposition 2. Suppose that M y 0. Then 



I - (I + My 1 r< M. 



(11) 



Proposition3. We have \\ M \\ < \\ A\\ + \\C\\ for each partitioned positive semidef- 
inite matrix 

A B 
B T C 



M 



(12) 



The proof of Theorem Q] is given below. 



Proof. Since an orthogonal projector projects a given matrix to the range (column 
space) of a matrix M is defined as Vm = M(M T M)~ 1 M T , the deterministic 
error © can be written as 



\E\ 



(13) 



\\A(I-V M )\\, M = AT4 

By applying Proposition[T]to the error (fT3l) . because range(M(V^ r AiY A^ 2 ) C 
range (M), we have 



where 



N 



A(I-V M )\\ < \\A(I-V N 
(VfAjA? = 



AfV 1 T A 1 



AW 2 T A 1 



I 
H 



(14) 
(15) 



Thus (/ — Vn) can be written as 
I-V N = 



+ H T H) ~ L - (I + H T H) ~ l H 
-#(/ + # T #P I-H(I + H T H)~ 1 H 



T 
1 „ T 



For the top-left block in CHI]), Proposition leads to / - (J + # T #) 1 H 
# T if. For the bottom-right block in CHI), Lemma[[]leads to I-H (I + iJ T X 

/. Therefore, 



I-V N < 



H T H -(/ + H T Hy 1 H T 

-H (/ + H T H) 1 I 
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By applying Lemma [Q we have 

A(I-V N )A± 

A^^HA, 
-AlH (l + H T H)~ l Ai 



-A? {I + H T H)- 1 H T A 2 



A T 2 A 2 



According to Proposition [3l the spectral norm of A(I — Vn) is bounded by 

\\A{I-V N )\\ 2 = \\A(I-V N )A\\ 

< \\A 2 2 (V 2 T A 1 ) (V^A^A^f + ||A 2 f . (16) 

By substituting (fT6l) into (fl4l) . we obtain the deterministic error bound. This 
completes the proof. □ 

4.2 Proof of Theorem |2] 

The following proposition from [3] will be used in the proof. 

Proposition 4. Let V be an orthogonal projector, and let Abe a matrix. For each 
nonnegative q, 

\T\1 ,111/(29+1) 



\\VA\\ < \\V(AA T ) q A\ 
The proof of Theorem [2] is given below. 



(17) 



Proof. The power scheme modification © applies the BRP based low-rank ap- 
proximation © to X = (XX T ) q X = UA 2q+1 V T rather than X. In this case, the 
approximation error is 



||1 - L\\ = \\A 2q+1 (I - V M )\\ , M = A 2{ ? q+1 W T A x . 
According to Theorem [TJ the error is upper bounded by 



(18) 



X-L 



< 



Af« +1) {VIA,) (Vf AO+AI 



(29+1) 



I a 2g+l I 
yi 2 



(19) 



The deterministic error bound for the power scheme modification is obtained by 
applying Proposition|4]to (fT9l) . This completes the proof. □ 
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4.3 Proof of Theorem g| 

The following propositions from [3] will be used in the proof. 

Proposition 5. Fix matrices S, T, and draw a standard Gaussian matrix G. Then 
it holds that 



¥,\\SGT T \\ < ||5||||T|| F +||5||,||r||. 



(20) 



Proposition 6. Draw an r x (r + p) standard Gaussian matrix G with p > 2. 
Then it holds that 



E||Gt|| 



p — 1 

The proof of Theorem [3] is given below. 



P 



(21) 



Proof. The distribution of a standard Gaussian matrix is rotational invariant. Since 
1) A\ is a standard Gaussian matrix and 2) V is an orthogonal matrix, V T Ai is a 
standard Gaussian matrix, and its disjoint submatrices V^Ai and V% A\ are stan- 
dard Gaussian matrices as well. 

Theorem Q] and the Holder's inequality imply that 



E||X - L\\ < E (||A* (Vrf Ai) (V^A^A^f + ||A 2 || 2 ) 
< E || Al (V 2 T A 1 ) (VfA^A^W + ||A 2 ||. 



1/2 



(22) 



We condition on V± A\ and apply Proposition [5] to bound the expectation w.r.t. 



V?Ai, i.e., 



^ H^Vl {VIA,) (VfA^A^W 

< E (UMII WiV^A^A^ + ||A^|| F ||(Vf A^tArl) 

<||as|| (E||(^A 1 ) t Ar 1 |&) 1/2 + 
llA^i^.EiKvf^n-iiAr 1 !!. 

The Frobenius norm of (V^ A t y A^ 1 can be calculated as 



(23) 



IGWAr 1 ! 



trace 



trace 



Ar 1 ((VfAin (Vf AO+Ar 1 
((Ai^) (A x Vf ^) 
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Since 1) V^Ai is a standard Gaussian matrix and 2) A x is a diagonal matrix, each 
column of AiV^ r A 1 follows r-variate Gaussian distribution A/" r (0, A 2 ). Thus the 

random matrix ({A{V^Ax) [AiV^Ai) T j follows the inverted Wishart distri- 



bution W r 1 (A 1 2 , r + p). According to the expectation of inverted Wishart distri 
bution Hi, we have 



e\\(v?aJa? 

= E trace 



I 2 



= trace E 
1 

T 



V 



(AxlfAx) {A l V^A l f 
(A 1 V^A 1 ) (A 1 V^A 1 f 



We apply Proposition |6] to the standard Gaussian matrix A\ and obtain 



E || (y T Ai)t ||<£^±P_ 



Therefore, (1231) can be further derived as 

£||a 2 (^A)^ 1 !! 



\ 



P 



i=i 



\ 



<ZT?1 P 



i=r+l 



IA 



T+l 



\ 



\ 



1 \2 

A 2 ' 

i=r+l V 



By substituting (1261) into ([22]) . we obtain the average error bound 

Epr-L|| < 



A 



P 



i=l 



r X 2 

A r+1 



A 



1 |Ar+i| + 



e^r + p 
V 



X 2 

X2 



^ i=r+l 



This completes the proof. 



(24) 



(25) 



(26) 



(27) 
□ 
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4.4 Proof of Theorem S 

The proof of Theorem |4] is given below. 

Proof. By using Holder's inequality and Theorem[2l we have 

E || AT - i|| < (E||X-L|| 2?+1 ) 1/(29+1) 

1/(29+1) 



< E 



X-L 



(28) 



We apply Theorem [3] to X and L and obtain the bound of E||Jf — L\\, noting that 



E 



X-L 



I 1 r ^2(29+1) \ 



ey/r + p 
P 



\ E x 

\ i=r+l A 



,2(2<z+l) 



(29) 



By substituting (1291) into (1281) . we obtain the average error bound of the power 
scheme modification shown in Theorem |H This completes the proof. □ 



4.5 Proof of Theorem H 

The following propositions from [3 J will be used in the proof. 
Proposition 7. Suppose that h is a Lip schitz function on matrices: 

\h(X) - h(Y)\ < L\\X - F\\ F forallX.Y. 
Draw a standard Gaussian matrix G. Then 

Pr{/i(G) > Eh(G) + Li} < e~* 2/2 . 



(30) 



(31) 



Proposition 8. Let G be a r x (r + p) standard Gaussian matrix where p > 4. 
For all t > 1, 



P *{\\ G %>)j J f- t }<^ P and 
Pr[||Gt||>^^.A< r 



f-(p+i) 



(32) 
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The proof of Theorem [5] is given below. 

Proof. According to the deterministic error bound in Theorem[T] we study the de- 
A l (^2 T ^i) {Yi A i) ] A r X • Consider the Lipschitz function h(X) = 
, its Lipschitz constant L can be estimated by using the tri- 



viation of 



angle inequality: 



\h(X) - h(Y)\ < \\A 2 2 (X - Y) (V?A^ Ar 1 

<||A3||||x-r|| WivfAJW \\A?\\ 



<||A1 



A i 



Hence the Lipschitz constant satisfies L < ||A||| {V^ Ai)^ 
tion on V^Ax and then Proposition[5]implies that 

E [/i (V^At) | V^Ai] < ||Al|| {V 1 T A 1 ) f || A r 1 || F + 



(33) 
We condi- 



We define an event T as 

T = 



lA 2 ll 



< 



12r 
P 



■ t and 



< e^r + p _ 
~ P+ 1 



(34) 



According to Proposition [81 the event T happens except with probability 

Pr {T} < Ar p + t~ {p+1) . (35) 
Applying Proposition[7]to the function h (V 2 T Ai), given the event T, we have 



Pr 



K\ (V 2 T AA (VfAO+Ar 1 



> 



I A 2 1 



(^M) f F ll A r 1 L + 
(^M) t ||ll A r 1 || + 

(VfAi) f \\A^\\ -u | T} < e" 



lllj 



■« 2 /2 



(36) 
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According to the definition of the event T and the probability of T, we obtain 



Pr 



> 



l^llllAr 1 ! 



12r 



V 



ll A 2L|| A I 



in eV r + P 
! p+ 1 



II A 211 II A -1 II +P 

+ A, A-, 1 — - 

11 211 11 1 11 p+1 

e - u y 2 + U - P + t -( P +i) _ 



tu )■ < 



Therefore, 



Pr 



i+VtIEv 1 ] + 



1/2 y 

p + 1 



< 




a=l 



p + 



n \!/2 



Vi=r+1 



e -« 2 /2 + 4r P + i-(p+i), (37) 

Since Theorem Q] implies ||X - L\\ < A§ (V? A x ) (VfA^ A7 1 + ||A 2 ||, we 
obtain the deviation bound in Theorem [5] This completes the proof. □ 



5 Empirical Study 

We first evaluate the efficiency of the BRP based low-rank approximation £[]) for 
exact recovery of low-rank matrices. We consider square matrices of dimension 
n from 500 to 30000 with rank r from 50 to 500. Each matrix is generated by 
AB, wherein A and B are both n x r standard Gaussian matrices. Figured] shows 
that the recovery time is linearly increased w.r.t n. This is consistent with the 
r 2 (2n + r) + mnr flops required by (OQ). The relative error of each recovery is 
less than 10~ 14 . It also shows that a 30000 x 30000 matrix with rank 500 can be 
exactly recovered within 200 CPU seconds. This suggests the advantage of CO for 
large-scale applications. 

We then evaluate the effectiveness of CD and its power scheme modification 
© in low-rank approximation of full rank matrix with slowly decaying singular 
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Rank of recovery matrix 

Figure 1 : low-rank matrix recovery via BRP: the recovery time for matrices of 
different size and different rank. 

values. We generate a square matrix with size 1000, whose entries are indepen- 
dently sampled from a standard normal distribution with mean and variance 1, 
and then apply (0Q) (q = 0) and © with q = 1, 2, 3 to obtain approximations with 
rank varying from 1 to 600. We show the relative errors in Figure[2]and the relative 
error of the corresponding SVD approximation as a baseline. The results suggest 
that our method can obtain a nearly optimal approximation when q is sufficiently 
large (e.g., 2). 

At last, we evaluate the efficiency and effectiveness of BRP on low-rank com- 
pression of human face images from dataset FERET [5 |. We randomly selected 
700 face images of 100 individuals from FERET and built a 700 x 1600 data ma- 
trix, wherein the 1600 features are the 40 x 40 pixels of each image. We then 
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I 
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100 200 300 400 500 600 
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Figure 2: low-rank approximation via BRP: the relative approximation error for a 
1000 x 1000 matrix with standard normal distributed entries on different rank. 

obtain two rank-60 compressions of the data matrix by using SVD and the power 
modification of BRP based low-rank approximation © with q = 1, respectively. 
The compressed images and the corresponding time costs are shown in Figure [3] 
and its caption. It indicates that our method is able to produce compression with 
competitive quality in considerably less time than SVD. 

6 Conclusion 

In this paper, we consider the problem of fast low-rank approximation. A closed 
form solution for low-rank matrix approximation from bilateral random projec- 
tions (BRP) is introduced. Given an m x n dense matrix, the approximation can 
be calculated from (m + n)r random measurements in r 2 (2n + r) + mnr flops. 
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Figure 3: low-rank image compression via BRP on FERET: BRP compresses 700 
40 x 40 face images sampled from 100 individuals to a 700 x 1600 matrix with 
rank 60. Upper row: Original images. Middle row: images compressed by SVD 
(6.59s). Bottom row: images compressed by BRP (0.36s). 

Power scheme is applied for improving the approximation precision of matrices 
with slowly decaying singular values. We prove the BRP based low-rank approx- 
imation is nearly optimal. The experiments on both artificial and real datasets 
verifies the effectiveness and efficiency of BRP in both low-rank matrix recovery 
and approximation tasks. 
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